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Abstract 

In this study, two homology models of the main proteinase (M pro ) from the novel coronavirus associated with severe acute res¬ 
piratory syndrome (SARS-CoV) were constructed. These models reveal three distinct functional domains, in which an intervening 
loop connecting domains II and III as well as a catalytic cleft containing the substrate binding subsites SI and S2 between domains I 
and II are observed. S2 exhibits structural variations more significantly than SI during the 200 ps molecular dynamics simulations 
because it is located at the open mouth of the catalytic cleft and the amino acid residues lining up this subsite are least conserved. In 
addition, the higher structural variation of S2 makes it flexible enough to accommodate a bulky hydrophobic residue from the 
substrate. 

© 2004 Elsevier B.Y. All rights reserved. 


1. Introduction 

Coronaviruses belong to a diverse group of positive- 
stranded RNA viruses and share a similar genome 
organization and common transcriptional/translational 
processes as Arteriviridae [1,2]. The human coronavirus 
HcoV-229E replicase gene encodes two overlapping 
polyproteins [3], that mediate all the functions required 
for viral replication and transcription [4]. The functional 
polypeptides are released from the polyproteins by 
extensive proteolytic processing, which is primarily 
achieved by the 33.1-kDa main proteinase (M pro ) [5]. 
M pi ° from HcoV-229E (M pio H) has been biosynthesized 
in Escherichia coli and its enzyme properties have been 
well characterized [5,6]. 

Several studies have revealed significant differences 
in both the active sites and domain structures of M pro 
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from coronavirus and picornavirus [6-8]. Previous 
experimental data have shown that the differential 
cleavage kinetics of all coronaviruses is a conserved 
feature of M pro [9]. Furthermore, the cleavage pattern 
appears to be conserved in M pro from SARS-CoV 
(M pro S) and from other coronaviruses [10], as deduced 
from the genome sequence [11,12]. The functional 
importance of M pi ° in the viral life cycle has made it 
an attractive target for the development of drugs direc¬ 
ted against SARS and other coronavirus infections. 
Thus, screening the known proteinase inhibitor librar¬ 
ies may be an appreciated shortcut to discover anti- 
SARS drugs [13]. Crystal structures of M pro H [10] 
and M pi ° from porcine coronavirus (transmissible gas¬ 
troenteritis virus, TGEY) (M pio T) complexed with its 
inhibitor [14] have been determined. Comparison of 
these structures reveals a remarkable degree of struc¬ 
tural conservation. 

Previously, several molecular dynamics (MD) simula¬ 
tions, homology modeling, and molecular docking 
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experiments have been conducted in our group 
[15-18]. In this Letter, two homology models of M pro S 
(denoted as M pro SH and M pro ST) were constructed 
based on the crystal structures of M pro H [10] and M pro T 
[14], respectively. In addition, MD simulations were per¬ 
formed to investigate the dynamics behaviors of these 
structures. 


2. Methods 

2.7. Template proteins 

The atomic coordinates of M pio T and M pio H were 
obtained from the protein data bank (PDB; llvo and 
lp9u, respectively). Unfavorable non-physical contacts 
in these structures were eliminated using Biopolymer 
module of Insight II (Accelyrs, San Diego, CA, USA) 
with the CVFF forcefield [19] in the SGI 02 + worksta¬ 
tion with 64-bit MIPS RISC R12000 270 MHz CPU 
and PMC-Sierra RM7000A 350 MHz processor (Silicon 
Graphics, Inc., Mountain View, CA, USA), followed by 
10000 energy minimization calculations using steepest 
descent method. 

2.2. Structural homology 

The procedures of amino acid sequence alignment 
and homology modeling were described previously 
[18]. The newly built homology models were substan¬ 
tially refined to avoid van der Waals radius overlapping, 
unfavorable atomic distances, and undesirable torsion 
angles using molecular mechanics and dynamics features 
in Discover module. 

2.3. Molecular dynamics simulations 

The present MD simulations were performed in the 
CVFF forcefield [19]. The crystal structures of M pio H 


and M pio T and the homology models of M pio SH and 
M pio ST were subjected to energy minimization calcula¬ 
tions. Each energy-minimized structure was placed in 

° o 

the center of a lattice with the size of 50 x 60 x 85 A 
full of 6222, 5866, 5836, and 5776 water molecules 
for the system of M pio H, M pio T, M pio SH, and 
M pio ST, respectively. In order to arrange the soaked 
water molecules randomly, water molecules alone were 
submitted to 10000 iterations by conjugate gradient 
minimization, keeping the protein atoms fixed. The 
system composed of the minimized structures of pro¬ 
tein and water molecules was then used as the starting 
image. Finally, 200 ps MD simulation with 5 ps in 
equilibrium step was carried out for each system using 
the Discover module of Insight II. The explicit image 
periodic boundary condition (PBC) was used for sol¬ 
vent equilibrium. The temperature and pressure were 
maintained for each MD simulation at 300 K and 
one atmosphere, respectively, as described by Berend- 

o 

sen et al. [20]. Cut-off radius of 10 A for the non- 
bonded interactions was applied. The time-step of the 
MD simulations was 1 fs. The trajectories and coordi¬ 
nates of these structures were recorded every 2 ps for 
further analysis. 

3. Results and discussion 

3.1. Amino acid sequence alignment 

The results of amino acid sequence alignment of 
M pio S to M pio T and M pio H are given in Fig. 1. The res¬ 
idue corresponding to Ala46 in domain I of M pro S and 
those corresponding to Asp248, Ile249, and Gln273 in 
domain III of M pio S are missing in both M pio T and 
M pro H. In addition, there are one and two extra resi¬ 
dues at the C-terminus of M pro S comparing to M pro T 
and M pio H, respectively. Domain III exhibits higher 
sequence variation among these three domains. Both 
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Fig. 1. Amino acid sequence alignment of M pro T, M pro H, and M pro S. Secondary structures defined in the crystal structure of M pro T are shown on 
top. 
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the general acid-base catalyst (His residue in domain I) 
and the nucleophile (Cys residue in domain II) of these 
three proteins are totally conserved. 

Table 1 lists the percentages of amino acid identity 
among these proteins. M pro T and M pro H show the high¬ 
est total amino acid identity (60.80%), whereas M pro H 
and M pro S exhibit the lowest total amino acid identity 
(40.19%). In addition, domain II has the highest amino 
acid identity, whereas domain III shows the lowest ami¬ 
no acid identity among these three proteins. The low 
sequence identities between M pro S and M pro T and be¬ 
tween M pio S and M pio H from the present study are 
in good agreement with the previous results [21], 
where SARS-CoV was classified as a new group of cor- 


Table 1 


The amino acid sequence identities among M pro H, M pro T, and M pro S 



Identity (%) 



Total 

Domain I 

Domain II 

Domain III 

M pro H and M pro T 

60.80 

63.44 

65.06 

55.45 

M pro H and M pro S 

40.19 

41.94 

45.78 

35.64 

M pro T and M pro S 

43.85 

44.09 

49.40 

39.22 


onavirus based on the analysis of the deduced genome 
sequence. 

3.2. The homology models of M pro ST and M pro SH 

The homology models of M pio ST and M pio SH are 
illustrated in Figs. 2a and b, respectively. Both M pro ST 
and M pro SH exhibit three distinct domains and adopt 
similar folds as M pio T and M pio H, respectively. These 
models are in the similar order of magnitude comparing 
to the homology models constructed previously [10,13]. 
The quality of the geometry and of the stereochemistry 
of these homology models was further validated using 
Homology/ProStat/Struct_Check commend of Insight 
II. A total of 97% and 96% of the backbone dihedral 
angle (cp and densities are located within the structur¬ 
ally favorable regions in Ramachandran plot for 
M pio ST and M pro SH, respectively. The calculation of 
main chain torsion angles (xi and % 2 ) of these models 
showed no severe distorsion of the backbone geometry. 

The putative substrate binding subsites SI and S2 of 
M pro ST and M pro SH are located in a cleft between do¬ 
mains I and II, which are nearly identical to those of 
M pro T and M pro H (Fig. 2). It indicates that M pro S 
may follow the similar substrate binding mechanisms 



Fig. 2. The homology model of (a) M pro ST and (b) M pro SH visualized by Insight II. a-Helices and P-strands are shown in red cylinders and yellow 
arrows, respectively. The general acid-base catalyst His residue and the nucleophilic Cys residue are labeled. The locations of the putative substrate 
binding subsites SI and S2 are indicated. 
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Table 2 

The RMSDs between the template proteins, M pro H and M pro T, and 
the homology models, M pro SH and M pro ST 



RMSD (A) 



M pro H 

M pi ’°T 

M pro SH 

M pro T 

2.01 

— 

— 

M pro SH 

4.51 

3.94 

— 

M pro ST 

4.84 

4.37 

5.78 


of M pro T and M pro H, allowing us to design anti-SARS 
drugs by simply screening the known proteinase inhibi¬ 
tors. The low sequence identity and secondary structure 
conservation in domain III among these proteins suggest 
that it may play a minor role in proteolytic activity. As 
shown in Table 2, the RMSDs of M pio SH and M pio ST 

o 

are 4.84 and 3.94 A, comparing to their corresponding 
templates, M pro H and M pro T, respectively; while the 
RMSD between M pro SH and M pro ST is 5.78 A. It indi¬ 
cates that the structure of M pro S is more similar to that 
of M pro T. 

3.3. Molecular dynamics simulations 

As shown in Fig. 3, these structures remained consid¬ 
erably stable during the MD time course, with the root- 


mean-square deviations (RMSDs) remained within 3 A. 
It is obvious that domain III exhibits higher structural 
variations than the other two domains in all cases. SI 
was found to maintain its structural integrity, whereas 
S2 exhibits higher structural fluctuations during the en¬ 
tire MD simulations. It is attributed to that S2 is located 
on the open mouth of the catalytic cleft between do¬ 
mains I and II, whereas SI is situated in the very bottom 
of this cleft and is well protected by the hydrophobic 
core. The higher structural variation of S2 makes it flex¬ 
ible enough to accommodate a bulky hydrophobic resi¬ 
due from the substrate. 

In the crystal structures, the distance between the sul¬ 
fur atom of Cysl44 and the N 82 of His41in M pio T is 
4.05 A [14], longer than the corresponding Cys-His dis¬ 
tances in HAV 3C pro (3.92 A) [22], poliovirus (PY) 3C pro 

o o 

(3.4 A) [23], and papain (3.65 A) [24]. From a dynamics 
point of view (Fig. 4), the Cysl44-His41 distance of 
M pro H fluctuated more rapidly than that of M pro T. In 
addition, the Cysl45-His41 distances of M pro SH fluctu¬ 
ated more rapidly than that of M pro ST beyond 150 ps. 
These results indicate that both M pro T and M pro ST 
may exhibit more stable active site configurations than 
those of M pro S and M pro SH. The large degree of fluctu¬ 
ation of these Cys-His distance may indicate that the 
structure of the catalytic site is not stable when it is 
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Fig. 3. The RMSDs of the backbone C a for (a) the whole protein, (b) domain I, (c) domain II, (d) domain III, (e) SI, and (f) S2 of M pro T, M pro H, 
M p r°ST, and M pro SH during MD simulations. 
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Fig. 4. The linear distance between the sulfur atom of the nucleophilic 
Cys residue and the N~ of the general acid-base catalyst His residue as 
a function of MD simulation time. 

not protected from substrate or ligand binding. This 
result is in very good agreement with the previous find¬ 
ings that there are significant differences in the flexibility 
in the active site of the SARS-CoV proteinase [25]. Fur¬ 
thermore, the high flexibility of the active site may allow 
these proteins to execute the catalytic process more 
efficiently. 


It has been shown previously that, similarly to 3C pro 
[23,24], specific substrate binding by M pro is ensured by 
the well-defined SI and S2 binding subsites [14]. In both 
M pio T and M pio H, S2 is lined by the side chains of 
His41, Thr47, Ile51, Leul64, and Pro 188, despite for 
the residue Leul64 in M pio T being replaced by lie. In 
M pro S, S2 is lined by the side chains of His41, Asp48, 
Pro52, Met 165, and Gin 189. It indicates that S2 is not 
as conserved as SI among these proteins. It is worthy 
of mentioning that the main chain of Leu 164 of M pio T 
(or lie 164 of M pio H or Met 165 of M pio S) forms part 
of SI, while its side chain is involved in S2, indicating 
that these two subsites are somewhat influenced by each 
other towards substrate binding. 

The analysis of AS As of both SI and S2 during the 
MD simulations indicates that both subsites are flexible 
enough to accommodate the substrates. The snapshots 
of both SI and S2 for these proteins with the smallest 
and largest accessible surface areas (ASAs) sampled 
from the 200 ps MD simulations were illustrated in 
Fig. 5. It is interesting that the sizes and conformations 
of the smallest and the largest SI pocket of M pro SH 
are very similar to those of M pio T. The variation of 



Fig. 5. Molecular surfaces of the substrate binding subsites SI and S2 for M pro T, M pro H, M pro ST, and M pro SH with the smallest and the largest 
ASAs during MD simulations. The residues forming these subsites are shown in red. 
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the size and conformation of S2 for these proteins is 
more significant than SI during the MD simulations, 
probably because part of S2 is fully exposed to the 
solvent. 
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